WHAT IS CLAIMED IS: 



1 . A method for identifying a media file, the method comprising: 

searching a collection of machine readable data to locate an unknown media 

file therein; 

generating a media file identifier for an unknown media file located in the 
collection of machine readable data; 

determining an address of the unknown media file in the collection of machine 

readable data; 

storing the media file identifier for the unknown media file in a database; 
storing the address of the unknown media file in a database; 

associating the stored address of the unknown media file with the stored media 
file identifier for the unknown media file. 



2. A method for identifying a media file as defined in claim 1 , wherein the 
machine readable data resides on a computer network. 

3. A method for identifying a media file as defined in claim 2, wherein the 
computer network includes the Internet. 

4. A method for identifying a media file as defined in claim 3, wherein the 
searching is accomplished by a crawler. 

5. A method for identifying a media file as defined in claim 4, wherein the 
crawler is capable of searching a network site based on an address for the network site 
provided by an administrator. 

6. A method for identifying a media file as defined in claim 5, wherein the 
crawler is further capable of analyzing the machine readable data residing on the network site 
to generate an address of another network site to be searched. 

7. A method for identifying a media file as defined in claim 6, wherein the 
generating of the media file identifier for the unknown media file utilizing the identifier 
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generating algorithm is accomplished by downloading the unknown media file and then 
analyzing the unknown media file with the identifier generating algorithm. 

8. A method for identifying a media file as defined in claim 6, wherein the 
unknown media file is a streaming media file and wherein the generating of the media file 
identifier for the unknown media file utilizing the identifier generating algorithm is 
accomplished by playing the unknown media file as a stream of data and analyzing the 
stream of media data with the identifier generating algorithm as the stream is received by the 
crawler. 

9. A method for identifying a media file as defined in claim 1 , wherein the 
unknown media file is an audio file and wherein the identifier generating algorithm is an up- 
down coding algorithm. 

1 0. A method for identifying a media file as defined in claim 1 , wherein the unknown 
media file is a video file and wherein the identifier generating algorithm is a word count 
algorithm. 

11. A method for identifying media file as defined in claim 1 , wherein the 
generating of the media file identifier for the unknown media file is accomplished utilizing an 
identifier generating algorithm, the method further comprising providing a query media file, 
generating a media file identifier for the query media file utilizing the identifier generating 
algorithm, comparing the media file identifier for the query file with the media file identifier 
for the unknown media file in order to determine if the respective media files from which the 
query media file identifier and the unknown media file identifier were generated have 
identical media content, and providing the location of the unknown media file in response to 
a determination that the query media file identifier and the unknown media file identifier 
were generated from media files having identical media content. 

12. A method for identifying a media file as defined in claim 1 1, further 
comprising providing metadata that includes information sufficient to identify the unknown 
media file, storing the metadata in a database and associating the metadata with the unknown 
media file. 
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13. A method for identifying a media file as defined in claim 1 , further comprising 
providing a media file identifier for a known media file and comparing the media file 
identifier for the unknown media file with the media file identifier for the known media file 
in order to determine if the respective media files from which the known media file identifier 
and the unknown media file identifier were generated have identical media content. 

14. A method for identifying a media file as defined in claim 13, wherein the 
machine readable data resides on a computer network. 

15. A method for identifying a media file as defined in claim 14, wherein the 
generating of a media file identifier for an unknown media file is accomplished by analyzing 
the unknown media file with an identifier generating algorithm and wherein the providing a 
media file identifier for a known media file is accomplished by analyzing the known media 
file with the identifier generating algorithm. 

16. A method for identifying a media file as defined in claim 15, wherein the 
unknown media file is an audio file and wherein the identifier generating algorithm is an up- 
down coding algorithm. 

17. A method for identifying a media file as defined in claim 15, wherein the 
unknown media file is a video file and wherein the identifier generating algorithm is a word 
count algorithm. 

18. A method for identifying a media file as defined in claim 1 , further including, 
prior to the searching the collection of machine readable data, generating a media file 
identifier for a known media file and storing the media file identifier for the known media file 
in a database and further including, after the associating, comparing the media file identifier 
for the unknown media file with the media file identifier for unknown media file in order to 
determine if the respective media files from which the known media file identifier and the 
unknown media file identifier were generated have identical media content. 

19. A method for identifying a media file as defined in claim 1 8, further 
comprising providing metadata that includes information sufficient to identify the known 
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media file, storing the metadata in a database and associating the metadata with the known 
media file. 

20. A method for identifying a media file as defined in claim 1 9, wherein the 
searching is accomplished by a crawler. 

21. A method for identifying a media file as defined in claim 20, wherein the 
crawler is capable of searching a network site based on an address for the network site 
provided by an administrator. 

22. A method for identifying a media file as defined in claim 2 1 , wherein the 
crawler is further capable of analyzing the machine readable data residing on the network site 
to generate an address of another network site to be searched. 

23 . A method for identifying a media file as defined in claim 22, wherein the 
generating of the media file identifier for the unknown media file utilizing the identifier 
generating algorithm is accomplished by downloading the unknown media file and then 
analyzing the unknown media file with the identifier generating algorithm. 

24. A method for identifying a media file as defined in claim 22, wherein the 
unknown media file is a streaming media file and wherein the generating of the media file 
identifier for the unknown media file utilizing the identifier generating algorithm is 
accomplished by playing the unknown media file as a stream and analyzing the stream of 
media data with the identifier generating algorithm as the stream is received by the crawler. 

25. A method for identifying a media file as defined in claim 22, wherein the 
crawler is implemented by a plurality of computers distributed throughout the computer 
network and wherein the searching is accomplished by the plurality of computers searching 
the network site simultaneously. 

26. A method for identifying a media file as defined in claim 25, wherein the 
crawler controls the plurality of computers so as to mimic the behavior of a human user 
searching the network site. 
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27. A method for identifying a media file resident on a network, the method 
comprising: 

creating a media file identifier by analyzing a known media file using an identifier 
generating algorithm; 

storing the known media file identifier in a database; 

creating a media file identifier for an unknown media file with the identifier 
generating algorithm; and 

comparing the media file identifier for the unknown media file with known media file 
identifier in order to determine if the respective media files from which the unknown media 
file identifier and the known media file identifier were generated include identical media 
content. 

28. The method of claim 27, further comprising: 

storing in the database metadata for the known media file, the metadata providing 
information sufficient to identify the known media file for which it is stored; 

associating the metadata for the known media file with the known media file identifier 
so that the identity of the unknown media file can be determined in the event that unknown 
media file and the known media file are determined to have been generated from media files 
having identical media content. 

29. The method of claim 28, wherein: 

the unknown media file and the known media file are video files; 

each media file contains computer readable code encoding a series of images; 

the images in each the series in each the media file are encoded as a plurality 

of words; 

the creating the known media file identifier comprises counting the words 
used to encode selected images of the known video file; 

the creating a media file identifier for the unknown media file comprises 
counting the words used to encode selected images thereof; and 

the comparing comprises comparing the number of words used to encode the 
selected images of the known media file with the number of words used to encode the 
selected images of the unknown media file. 
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30. The method of claim 29, wherein the images in each the series of images are 
encoded as a GOP. 

3 1 . The method of claim 29, wherein the creating a media file identifier for an 
unknown media file comprises sequentially generating word counts for selected successive 
images in the unknown media file, and the comparing comprises comparing the word counts 
of images of the unknown media file as they are generated with the word counts of 
corresponding images of the known media files; and terminating the generating and the 
comparing if a sufficiently close match is found, or when the word count of each unknown 
image in the sequence of unknown images has been compared. 

32. The method of claim 28 wherein the unknown media file and each known 
media file is an audio file. 

3 3 . The method of claim 32 wherein the content of each audio file includes an 
encoded audio signal, and the identifier generating algorithm is an up-down coding 
algorithm. 

34. The method of claim 33 wherein the up-down coding algorithm is used on the 
entire audio file. 

3 5 . The method of claim 3 3 wherein the up-down coding algorithm is used on 
only a portion of the audio file. 

36. The method of claim 3 1 wherein the network is the Internet and further 
comprising accessing the unknown media file via the Internet. 

37. The method of claim 36 wherein the accessing is performed in a coordinated 
manner by a crawler from a plurality of web addresses. 

38. An apparatus for identifying a media file residing on a network, the 
apparatus comprising: 
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at least one module configured to create a plurality of known media file 
identifiers, each for a respective one of a plurality of known media files, using an identifier 
generating algorithm; 

a database configured to store the known media file identifiers; 

at least one module configured to create a media file identifier for an unknown 
media file with the identifier generating algorithm; and 

at least one module configured to compare the media file identifier for the 
unknown media file with known media file identifiers to determine if the respective media 
files from which the known media file identifier and the unknown media file identifier were 
generated have identical media content. 

39. An apparatus as defined in claim 38, wherein the database is also configured 
to store metadata for each known media file in association with the corresponding known 
media file and known media file identifier located in the database. 

40. An apparatus as defined in claim 39, wherein the unknown media file and each 
known media file are video files, each media file containing computer readable code 
encoding a series of images including a plurality of words; wherein the apparatus further 
includes at least one module configured to create known media file identifiers by counting the 
words used to encode selected images of each known video file, wherein the apparatus 
further includes at least one module configured to create a media file identifier for an 
unknown media file by counting the words used to encode selected images thereof, and 

wherein the at least one module for comparing is configured to compare the number 
of words used to encode selected images of each known media file with the number of words 
used to encode selected images of the unknown media file. 

41. An apparatus as defined in claim 40, wherein the images in each the series of 
images are encoded as a GOP. 

42. An apparatus as defined in claim 40, wherein the at least one module 
configured to create a media file identifier for an unknown media file includes at least one 
module for sequentially generating word counts for successive images in the unknown media 
file, 
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wherein the at least one module configured to create media file identifiers for a known 
media file includes at least one module configured to generate word counts for successive 
images in each known media file and the at least one module configured to compare includes 
at least one module configured to compare each unknown image identifier to corresponding 
image identifiers for the known media files as they are generated, and wherein the apparatus 
further includes at least one module configured to terminate the at least one module 
configured to generate and the at least one module configured to compare when at least one 
sufficiently close match is found, or when the each unknown image identifier in the sequence 
of unknown image identifiers has been compared. 

43. An apparatus as defined in claim 39 wherein the unknown media file and each 
known media file is an audio file. 

44. An apparatus as defined in claim 43 wherein the content of each audio file 
includes an audio signal, and the identifier generating algorithm is an up-down coding 
algorithm. 

45. An apparatus as defined in claim 44 wherein the up-down coding algorithm is 
used on the entire audio file. 

46. An apparatus as defined in claim 44 wherein the up-down coding algorithm is 
used on only a portion of the audio file. 

47. An apparatus as defined in claim 38 wherein the network is the Internet and 
further comprising accessing the unknown media file via the Internet. 

48. An apparatus as defined in claim 47 wherein the means for accessing 

the Internet comprises at least one module configured to access the Internet in a coordinated 
manner from a plurality of web addresses. 
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